For stewards of data in which Indigenous peoples have rights (i.e., Indigenous data), ethical data stewardship requires honoring such rights. Theorists of Indigenous data sovereignty are exhorting stewards to honor these rights by aligning with the CARE principles of Indigenous data governance (Kukutai & Taylor, 2016; Carroll, 2020; Taitingfong forthcoming ?). The Neotoma Paleoecology Database is working to align with the CARE principles through its participation in the Ethical Open Science for Past Global Change Data Research Coordination Network.
Neotoma holds paleodata of many proxy types (e.g., pollen, vertebrate fauna, diatoms) from across the world (Williams et al 2018). Expertise concerning the different proxies stewarded by Neotoma is distributed among different kinds of specialists (e.g., palynologists, vertebrate paleoecologists, diatomists), and knowledge of all the datasets in Neotoma is likewise distributed. Because they derive from Indigenous lands, or concern the human and nonhuman relations of Indigenous peoples, some of these datasets are Indigenous. But so far, there is no complete record of the extent of these Indigenous data in Neotoma.
In order to achieve alignment with CARE, it is necessary to have a more complete accounting of 1) what Indigenous data Neotoma currently stewards and 2) how Neotoma’s stewardship of those data conforms or not to best practice. We present here an account of our attempt to perform such a data audit, as well as our preliminary findings.
We searched programmatically through Neotoma’s metadata for any sensitive Indigenous data. Such sensitive Indigenous data include those that concern Indigenous ancestors (e.g., human skeletal elements and human-derived radiocarbon dates) or derive from sacred sites (e.g., burial mounds). We also searched for any records in which purposeful imprecision has been introduced into the site coordinates because introducing imprecision is one way to protect sensitive sites. Lastly, we searched for datasets that were collected from federally recognized Indigenous lands, since these may be most relevant to nation building activities of tribal governments.
We documented sensitive Indigenous records, fuzzed sites, and data from federally recognized Indigenous lands. Although limitations in Neotoma’s metadata led us to systematically undercount or overcount records in different circumstances, we are hopeful that this preliminary inventory will spur us to better steward these data.
Note: the following audit contains potentially sensitive information about Indigenous ancestors. Our intent is to expose this information in order to work toward better management in the future.
Our audit has two prongs. First, we intend to surface those data which most directly manifest a connection to colonial violence. In the case of Neotoma, these data include records of human skeletal elements from Indigenous lands, as well as radiocarbon dates which derive from such elements. These records stigmatize and objectify Indigenous ancestors. These data also include records from culturally sensitive areas, such as Indigenous burial mounds. Ethical investigation of such sensitive sites requires consultation with local tribal nations and adherence to cultural protocols, which have often been lacking in paleoecological studies to date. Conversely, as part of this first prong, we think it useful to surface those Neotoma records which exemplify CAREful research practice. These include records that introduce purposeful imprecision into site coordinates in order to protect culturally sensitive locations, as well as data deriving from studies which explicitly document a process of collaboration with local tribal nations. The second prong of our audit is meant to surface those Neotoma records which are useful to tribal nations for their self-governance (data for governance subprinciple of CARE). These include any Neotoma records which derive from federal Indigenous lands.
We searched for the following kinds of records:We downloaded Neotoma’s taxa table and selected any taxon IDs which might describe people. See table below. (Taxon ID 6359 is Primates, and 6171 is Mammalia.)
Then we used a Neotoma API to search for any occurrences of those taxon IDs.
The map below shows the sites where human samples come from, and the table documents what information there is about those samples. Rows colored red are sensitivity level 1 because they come from North America. Rows colored orange are sensitivity level 2 because they come from elsewhere.
It should be noted that lead FAUNMAP steward Jessica Blois has removed all sample-level Homo sapiens occurrences from public access as FAUNMAP works on a policy for managing these data.
The table below counts sample records by sensitivity and constituent database.
Our next steps are to reach out to the lead stewards for the Faunal Isotope Database, PaVeLa and FAUNMAP, so they can come to a decision about managing these human records in their databases.
We searched through two fields (notes and materialdated) from Neotoma’s geochronology table for any occurrences of words from the dictionary below.
Any rows from the geochronology table which contained one of the above words is listed in the table below. Notice that not all of these radiocarbon dates are necessarily problematic, only potentially. Further scrutiny may be needed. (We also checked against CARD’s list of radiocarbon dates deriving from human ancestors that are duplicated in Neotoma, and there was agreement between the two lists: all 60 of CARD’s records that are also in Neotoma are in the below table.)
We assigned sensitivity categories as follows: any references to human bone were assigned sensitivity level 1. Any references to human feces were assigned sensitivity level 2. References to human graves or burials also merited a 2. All other items were given sensitivity level 3. All publications linked to records in which the material dated was taxon-ambiguous bone collagen were consulted. We found that geochron IDs 21255, 29333, 29334, and 29335 definitely derive from humans. These records were therefore categorized as sensitivity level 1.
Below the color-coded table, we count records by their sensitivity and the constituent database of which they are a part.
Reach out to stewards of relevant constituent databases and ask them to come to a decision about managing these records.
We used the same dictionary from the last query to search through two fields in Neotoma’s collection units table (location and notes). Any collection units that returned one of the above words is reproduced below. The records were individually scrutinized categorized subjectively into sensitivity categories.
We counted the number of records by their constituent database and by their sensitivity. Notice that the count here is greater than the total number of collectionunits because constituent databases are linked to datasets, not collection units, and multiple datasets can derive from a single collection unit. (We did exclude the Neotoma datasettype “geochronologic”.)
Our next steps are to reach out to the relevant constituent database stewards for sensitivity levels 1 and 2 and ask them to come to a decision about managing these records.
We count any site whose geography is provided as a bounding box rather a point as fuzzed because according to the Neotoma Manual, “the lat-long box can be used either to circumscribe the areal extent of a site or to provide purposeful imprecision to the site location.” Notice that this is a liberal definition - some sites with bounding box geographies will have been so formatted for reasons other than purposeful imprecision.
We found 5476 fuzzed sites using this method. One table below documents the site names, dataset types, and constituent databases associated with fuzzed sites. The next table counts datasets associated with fuzzed sites by the type of dataset and the constituent database from which the dataset derives, and the map below documents fuzzed site locations.
Our next steps are to refine our definition of fuzzed sites.
We did a spatial join for every site in Neotoma with a unique site ID to shapefiles of the borders of federal Indigenous lands in the United States and Canada, and Indigenous protected areas in Australia, and we tallied and mapped all those which intersected the borders of federal reservations. See list below.
Next, we counted those Neotoma datasets derived sites which are on federal Indigenous lands by the Neotoma constituent database with which they are associated and the kind of dataset they are.